语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
我们考虑在严重数据稀缺下具有异质代理的离线强化学习(RL),即,我们只观察一个未知潜在的次优政策下的每个代理的单一历史轨迹。我们发现,即使对于常见的“解决”基准设置(如“Makescar”和“Cartpole”),我们发现最先进的离线和基于模型的RL方法的性能显着降低了显着的数据可用性。为了解决这一挑战,我们提出了一种基于模型的离线RL方法,该方法首先通过在学习政策之前共同使用所有代理商的历史轨迹来学习每个代理的个性化模拟器。我们这样做是这样做的,指出代理商的过渡动态可以表示为与代理商,州和行动相关的潜在因子的潜在函数;随后,理论上,理论上建立了这种函数通过可分离代理,状态和动作潜在函数的“低级”分解良好地近似。此表示表明,一个简单的正则化的神经网络架构,以有效地学习每个代理的过渡动态,即使具有稀缺,离线数据。我们在多个基准环境和RL方法中执行大量实验。我们的方法的一致性提高,在国家动态预测和最终奖励方面衡量,确认了我们框架在利用有限的历史数据方面的效力,以同时学习跨代理商的个性化政策。
translated by 谷歌翻译
全球粮食需求和严峻的工作条件的上升使水果收获成为自动化的重要领域。对于任何自动化的水果收获系统来说,花梗定位是重要的步骤,因为水果分离技术对花梗位置高度敏感。大多数关于花梗本地化的工作都集中在计算机视觉上,但是由于农业环境的混乱性,花梗很难在视觉上访问。我们的工作提出了一种替代机械(而不是视觉)感知来定位花梗的替代方法。为了估算这一重要植物特征的位置,我们将扳手测量从腕部力/扭矩传感器到水果植物系统的物理模型,将水果的附着点视为要调整的参数。该方法是作为水果采摘程序的一部分进行内联执行的。使用我们的果园代理进行评估,我们证明了该技术能够将花梗定位在3.8 cm的中间距离内,中位方向误差为16.8度。
translated by 谷歌翻译
陆地植物的多样性在维持稳定,健康和生产的生态系统方面起着关键作用。尽管遥感被认为是估计植物多样性的有前途且具有成本效益的代理,但缺乏关于如何从Spaceborne Hyperfectral数据中推断出植物多样性的定量研究。在这项研究中,我们评估了通过DLR接地传感成像光谱仪(DESIS)捕获的高光谱数据的能力,以估计澳大利亚东南部南部梯田和雪山地区的植物物种丰富度。首先通过主成分分析,规范相关分析和部分最小二乘分析从Desis光谱中提取光谱特征。然后在提取的特征和植物物种丰富度之间进行了回归,并具有普通的最小二乘回归,内核脊回归和高斯工艺回归。根据两倍的交叉验证方案,使用相关系数($ r $)和根平方错误(RMSE)评估结果。凭借最佳性能的模型,$ r $为0.71,而南部塔林群岛地区的RMSE为5.99,而$ R $为0.62,而雪山地区的RMSE为6.20。这项研究中报道的评估结果为未来的研究提供了支持,了解太空传播高光谱测量与陆地植物生物多样性之间的关系。
translated by 谷歌翻译
果树的休眠修剪是维持树木健康和确保高质量果实的重要任务。由于劳动力的可用性降低,修剪是机器人自动化的主要候选者。但是,修剪也代表了机器人的独特困难问题,需要在可变照明条件下以及在复杂的,高度非结构化的环境中进行感知,修剪点的确定和操纵。在本文中,我们介绍了一种用于修剪甜樱桃树的系统(在平面树建筑中,称为直立的果实分支配置),该系统整合了我们先前关于感知和操纵的工作的各种子系统。最终的系统能够完全自主运行,并且需要对环境的最低控制。我们通过在甜蜜的樱桃果园中进行现场试验来验证系统的性能,最终取得了58%的削减速度。尽管不完全稳健,并且需要改善吞吐量,但我们的系统是第一个在果树上运行的系统,并代表了将来可以改进的有用的基础平台。
translated by 谷歌翻译
在本文中,我们研究了在共享物理空间中运行时的影响界面和反馈对人机信任级别的反馈。我们使用的任务是为室内环境中的机器人指定“无-Go”区域。我们评估三种界面(物理,AR和基于地图)和四个反馈机制(无反馈,机器人在空间,AR“栅栏”和地图上标记的区域)。我们的评估看起来可用和信任。具体而言,如果参与者信任机器人“知道”在禁止地区是禁止机器人避免该区域的能力的地方。我们使用自我报告和间接的信任措施和可用性。我们的主要研究结果是:1)接口和反馈确实影响信任水平;2)参与者在很大程度上优选的混合界面反馈对,其中界面的模态与反馈不同。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译